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ABSTRACT 

The ever-increasing amount of information flowing through 
Social Media forces the members of these networks to com- 
pete for attention and influence by relying on other people 
to spread their message. A large study of information propa- 
gation within Twitter reveals that the majority of users act 
as passive information consumers and do not forward the 
content to the network. Therefore, in order for individuals 
to become influential they must not only obtain attention 
and thus be popular, but also overcome user passivity. We 
propose an algorithm that determines the influence and pas- 
sivity of users based on their information forwarding activ- 
ity. An evaluation performed with a 2.5 million user dataset 
shows that our influence measure is a good predictor of URL 
clicks, outperforming several other measures that do not ex- 
plicitly take user passivity into account. We also explicitly 
demonstrate that high popularity does not necessarily imply 
high influence and vice- versa. 

1. INTRODUCTION 

The explosive growth of Social Media has provided mil- 
lions of people the opportunity to create and share content 
on a scale barely imaginable a few years ago. Massive partic- 
ipation in these social networks is reflected in the countless 
number of opinions, news and product reviews that are con- 
stantly posted and discussed in social sites such as Facebook, 
Digg and Twitter, to name a few. Given this widespread 
generation and consumption of content, it is natural to tar- 
get one's messages to highly connected people who will prop- 
agate them further in the social network. This is particularly 
the case in Twitter, which is one of the fastest growing so- 
cial networks on the Internet, and thus the focus of advertis- 



Permission to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
republish, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. 

Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. 



Wojciech Galuba 

EPFL 

Distributed Information 

Systems Lab 
Lausanne, Switzerland 
wojciech.galuba@epfl.ch 

Bernardo A. Huberman 

Social Computing Lab 
HP Labs 
Palo Alto, California, USA 
bernardo.huberman@hp.com 

ing companies and celebrities eager to exploit this vast new 
medium. As a result, ideas, opinions, and products compete 
with all other content for the scarce attention of the user 
community. In spite of the seemingly chaotic fashion with 
which all these interactions take place, certain topics man- 
age to get an inordinate amount of attention, thus bubbling 
to the top in terms of popularity and contributing to new 
trends and to the public agenda of the community. How 
this happens in a world where crowdsourcing dominates is 
still an unresolved problem, but there is considerable consen- 
sus on the fact that two aspects of information transmission 
seem to be important in determining which content receives 
inordinate amounts of attention. 

One is the popularity and status of given members of these 
social networks, which is measured by the level of atten- 
tion they receive in the form of followers who create links 
to their accounts to automatically receive the content they 
generate. The other is the influence that these individuals 
wield, which is determined by the actual propagation of their 
content through the network. This influence is determined 
by many factors, such as the novelty and resonance of their 
messages with those of their followers and the quality and 
frequency of the content they generate. Equally important 
is the passivity of members of the network which provides a 
barrier to propagation that is often hard to overcome. Thus 
gaining knowledge of the identity of influential and least pas- 
sive people in a network can be extremely useful from the 
perspectives of viral marketing, propagating one's point of 
view, as well as setting which topics dominate the public 
agenda. 

In this paper, we analyze the propagation of web links on 
twitter over time to understand how attention to given users 
and their influence is determined. We devise a general model 
for influence using the concept of passivity in a social net- 
work and develop an efficient algorithm similar to the HITS 
algorithm [12] to quantify the influence of all the users in the 
network. Our influence measure utilizes both the structural 
properties of the network as well as the diffusion behavior 
among users. The influence of a user thus depends on not 
only the size of the influenced audience, but also on their 
passivity. This differentiates it from earlier measures of in- 
fluence which were primarily based on individual statistical 
properties such as the number of followers or retweets Y7\. 



We have shown through our extensive evaluation that 
this influence model outperforms other measures of influ- 
ence such as PageRank, H-index, the number of followers 
and the number of retweets. In addition it has good predic- 
tive properties in that it can forecast in advance the upper 
bound on the number of clicks a URL can get. We have 
also presented case studies showing the top influential users 
uncovered by our algorithm. An important conclusion from 
the results is that the correlation between popularity and 
influence is quite weak, with the most influential users not 
necessarily the ones with the highest popularity. Addition- 
ally, when we considered nodes with high passivity, we found 
a majority of them to be spammers and robot users. This 
demonstrates an application of our algorithm in automatic 
user categorization and filtering of online content . 

2. RELATED WORK 

The study of information and influence propagation in 
social networks has been particularly active for a number 
of years in fields as disparate as sociology, communication, 
marketing, political science and physics. Earlier work fo- 
cused on the effects that scale-free networks and the affinity 
of their members for certain topics had on the propagation 
of information Rj]. Others discussed the presence of key 
influentials 12 Ly |8j |5j [To] in a social network, defined as 
those who are responsible for the overall information dis- 
semination in the network. This research highlighted the 
value of highly connected individuals as key elements in the 
propagation of information through the network. 

Huberman et al. [2] studied the social interactions on 
Twitter to reveal that the driving process for usage is a 
sparse hidden network underlying the friends and follow- 
ers, while most of the links represent meaningless interac- 
tions. Jansen et al. [3] have examined twitter as a mecha- 
nism for word-of-mouth advertising. They considered par- 
ticular brands and products and examined the structure of 
the postings and the change in sentiments. Galuba et al. [2] 
propose a propagation model that predicts, which users will 
tweet about which URL based on the history of past user 
activity. 

There have also been earlier studies focused on social in- 
fluence and propagation. Agarwal et al. [§] have examined 
the problem of identifying influential bloggers in the blogo- 
sphere. They discovered that the most influential bloggers 
were not necessarily the most active. Aral et al [9] have 
distinguished the effects of homophily from influence as mo- 
tivators for propagation. As to the study of influence within 
twitter, Cha et al. [7] have performed a comparison of three 
different measures of influence - indegree, retweets and user 
mentions. They discovered that while retweets and men- 
tions correlated well with each other, the indegree of users 
did not correlate well with the other two measures. Based 
on this, they hypothesized that the number of followers may 
not a good measure of influence. On the other hand, Weng 
et al [5] have proposed a topic-sensitive PageRank measure 
for influence in Twitter. Their measure is based on the fact 
that they observed high reciprocity among follower relation- 
ships in their dataset, which they attributed to homophily. 
However, other work [7] has shown that the reciprocity is 
low overall in Twitter and contradicted the assumptions of 
this work. 

3. TWITTER 



3.1 Background on Twitter 

Twitter is an extremely popular online microblogging ser- 
vice, that has gained a very large user base, consisting of 
more than 105 million users (as of April 2010). The Twitter 
graph is a directed social network, where each user chooses 
to follow certain other users. Each user submits periodic 
status updates, known as tweets, that consist of short mes- 
sages limited in size to 140 characters. These updates typi- 
cally consist of personal information about the users, news 
or links to content such as images, video and articles. The 
posts made by a user are automatically displayed on the 
user's profile page, as well as shown to his followers. 

A retweet is a post originally made by one user that is for- 
warded by another user. Retweets are useful for propagating 
interesting posts and links through the Twitter community. 

Twitter has attracted lots of attention from corporations 
for the immense potential it provides for viral marketing. 
Due to its huge reach, Twitter is increasingly used by news 
organizations to disseminate news updates, which are then 
filtered and commented on by the Twitter community. A 
number of businesses and organizations are using Twitter 
or similar micro- blogging services to advertise products and 
disseminate information to stockholders. 

3.2 Dataset 

Twitter provides a Search API for extracting tweets con- 
taining particular keywords. To obtain the dataset for this 
study, we continuously queried the Twitter Search API for 
a period of 300 hours starting on 10 Sep 2009 for all tweets 
containing the string http. This allowed us to acquire a 
continuous stream of 22 million tweets with URLs, which 
we estimated to be l/15th of the entire Twitter activity at 
that time. From each of the accumulated tweets, we ex- 
tracted the URL mentions. Each of the unique 15 million 
URLs in the data set was then checked for valid formatting 
and the URLs shortened via the services such as bit . ly or 
tinyurl . com were expanded into their original form by fol- 
lowing the HTTP redirects. For each encountered unique 
user ID, we queried the Twitter API for metadata about 
that user and in particular the user's followers and followees. 
The end result was a dataset of timestamped URL mentions 
together with the complete social graph for the users con- 
cerned. 

User graph. The user graph contains those users whose 
tweets appeared in the stream, i.e., users that during the 
300 hour observation period posted at least one public tweet 
containing a URL. The graph does not contain any users who 
do not mention any URLs in their tweets or users that have 
chosen to make their Twitter stream private. 

For each newly encountered user ID, the list of followed 
users was only fetched once. Our data set does not capture 
the changes occurring in the user graph over the observation 
period. 

4. THE IP ALGORITHM 

Evidence for passivity. The users that receive informa- 
tion from other users may never see it or choose to ignore 
it. We have quantified the degree to which this occurs on 
Twitter (Fig. [4|. An average Twitter user retweets only one 
in 318 URLs, which is a relatively low value. The retweeting 
rates vary widely across the users and the small number of 
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Figure 1: Evidence for the Twitter user passivity. 
We measure passivity by two metrics: 1. the user 
retweeting rate and 2. the audience retweeting rate. 
The user retweeting rate is the ratio between the 
number of URLs that user i decides to retweet to 
the total number of URLs user i received from the 
followed users. The audience retweeting rate is the 
ratio between the number of user i's URLs that were 
retweeted by i's followers to the number of times a 
follower of i received a URL from i. 

the most active users play an important role in spreading the 
information in Twitter. This suggests that the level of user 
passivity should be taken into account for the information 
spread models to be accurate. 

Assumptions. Twitter is used by many people as a tool 
for spreading their ideas, knowledge, or opinions to others. 
An interesting and important question is whether it is pos- 
sible to identify those users who are very good at spreading 
their content, not only to those who choose to follow them, 
but to a larger part of the network. It is often fairly easy 
to obtain information about the pairwise influence relation- 
ships between users. In Twitter, for example, one can mea- 
sure how much influence user A has on user B by counting 
the number of times B retweeted A. However, it is not very 
clear how to use the pairwise influence information to accu- 
rately obtain information about the relative influence each 
user has on the whole network. To answer this question we 
design an algorithm (IP) that assigns a relative influence 
score and a passivity score to every user. The passivity of 
a user is a measure of how difficult it is for other users to 
influence him. We assume that the influence of a user de- 
pends on both the quantity and the quality of the audience 
she influences. In general, our model makes the following 
assumptions: 

1. A user's influence score depends on the number of peo- 
ple she influences as well as their passivity. 

2. A user's influence score depends on how dedicated the 
people she influences are. Dedication is measured by 
the amount of attention a user pays to a given one as 
compared to everyone else. 

3. A user's passivity score depends on the influence of 
those who she's exposed to but not influenced by. 

4. A user's passivity score depends on how much she re- 
jects other user's influence compared to everyone else. 



Algorithm 1: The Influence-Passivity (IP) algorithm 

J„«-(i,i,...,i)eRl AM i 
fb«-(l,l,...,l)eR' Jv l; 
for i = 1 to m do 

Update Ri using operation (2) and the values Ii-i; 

Update h using operation (1) and the values Ri\ 

for j — 1 to \N\ do 



h = 



Ri 



feeiv 
end 
end 

Return (I m , P m ); 



Operation. The algorithm iteratively computes both the 
passivity and influence scores simultaneously in a similar 
manner as in the HITS algorithm for finding authoritative 
web pages and hubs that link to them [14] . 

Given a weighted directed graph G = (N, E, W) with 
nodes N, arcs E, and arc weights W, where the weights 
Wij on arc e = represent the ratio of influence that i 

exerts on j to the total influence that i attempted to exert 
on j, the IP algorithm outputs a function I : N — > [0,1], 
which represents the nodes' relative influence on the net- 
work, and a function P : N — >• [0, 1] which represents the 
nodes' relative passivity of the network. 

For every arc e = (i,j) £ E, we define the acceptance rate 
Wi j 

by Uij = — ^ , . This value represents the amount 

E Wk i 

k:(k,])eE 

of influence that user j accepted from user i normalized 
by the total influence accepted by j from all users on the 
network. The acceptance rate can be viewed as the dedi- 
cation or loyalty user j has to user i. On the other hand, 
for every e = (j, i) £ E we define the rejection rate by 

vu — . — . Since the value 1 — Wa is amount 

E c 1 -^*) 

fc:(j,fc)£E 

of influence that user i rejected from j, then the value Vji 
represents the influence that user i rejected from user j nor- 
malized by the total influence rejected from j by all users in 
the network. 

The algorithm is based on the following operations: 



Pi 4- E 



(1) 



Each term on the right hand side of the above operations 
corresponds to one of the listed assumptions. In operation 
1 the term Pj corresponds to assumption 1 and the term 
Uij corresponds to assumption 2. In operation 2 the term Ij 
corresponds to assumption 3 and the term Vji corresponds 
to assumption 4. The Influence-Passivity algorithm (Algo- 
rithm [T]) takes the graph G as the input and computes the 
influence and passivity for each node in m iterations. 



Generating the input graph. There are many ways of 
defining the influence graph G = (N,E, W). We choose to 
construct it by taking into account retweets and the follower 
graph in the following way: The nodes are users who tweeted 
at least 3 URLs. The arc exists if user j retweeted a 

URL posted by user i at least once. The arc e = has 

s ■ ■ 

weight w e — q 12 - where Qi is the number of URLs that i 
mentioned and Sij is the number of URLs mentioned by i 
and retweeted by j. 

5. EVALUATION 

5.1 Computations 

Based on the obtained dataset ( |3.2| ) we generate the 
weighted graph using the method described in Q The 
graph consists of approximately 450k nodes and 1 million 
arcs with mean weight of 0.07, and we use it to compute 
the PageRank, influence and passivity values for each node. 
The Influence-Passivity algorithm (Algorithm Sjl]) converges 
to the final values in tens of iterations (Fig. [5| . 

PageRank. The PageRank algorithm has been widely 
used to rank web pages as well as people based on their au- 
thority and their influence [13| [5]. In order to compare it 
with the results from the IP algorithm, we compute PageR- 
ank on the weighted graph G = (N, E, W) with a small 
change. First, since the arcs e = (i, j) £ E indicate that 
user i exerts some influence on user j then we invert all the 
arcs before running PageRank on the graph while leaving 
the weights intact. In other words, we generate a new graph 
G' = (N',E',W) where TV' = N, E' = {(i,j) : G E}, 
and for each G E' we define u)y = Wji. This generates 
a new graph G' analogous to G but where the influenced 
users point to their influencers. Second, since the graph 
G' is weighted we assume that when the the random surfer 
of the PageRank algorithm is currently at the node i, she 

chooses to visit node j next with probability — x . 

Z2 w ' ik 

k:(i,k)£E' 

The Hirsch Index. The Hirsch index (or H-index) is 
used in the scientific community in order to measure the 
productivity and impact of a scientist. A scientist has index 
h if he has published h articles which have been cited at least 
h times each. It has been shown that the H-index is a good 
indicator of whether a scientist has had high achievements 
such as getting the Nobel prize uM- Analogously, in Twitter, 
a user has index h if h of his URL posts have been retweeted 
at least h times each. 

5.2 Influence as a correlate of attention 

Any measure of influence is necessarily a subjective one. 
However, in this case, a good measure of influence should 
have a high predictive power on how well the URLs men- 
tioned by the influential users attract attention and propa- 
gate in the social network. We would expect the URLs that 
highly influential users propagate to attract a lot of atten- 
tion and user clicks. Thus, a viable estimator of attention is 
the number of times a URL has been accessed. 

Click data. Bit.ly is a URL shortening service that for 
each shortened URL keeps track of how many times it has 
been accessed. For the 3.2M Bit.ly URLs found in the tweets 
we have queried the Bit.ly API for the number of clicks the 
service has registered on that URL. 




iteration 



Figure 2: IP-algorithm convergence. In each itera- 
tion we measure the sum of all the absolute changes 
of the computed influence and passivity values since 
the previous iteration 



URL traffic correlation. Using the URL click data, 
we take several different user attributes and test how well 
they can predict the attention the URLs posted by the users 
receive (Fig. [3|. It is important to note that none of the 
influence measures are capable of predicting the exact num- 
ber of clicks. The main reason for this is that the amount of 
attention a URL gets is not only a function of the influence 
of the users mentioning it, but also of many other factors 
including the virality of the URL itself and more impor- 
tantly, whether the URL was mentioned anywhere outside 
of Twitter, which is likely to be the biggest source of unpre- 
dictability in the click data. The click data that we collected 
represents the total clicks on the URLs. 

The wide range of factors potentially affecting the Bit.ly 
clicks may prevent us from predicting their number accu- 
rately. However, the upper bound on that number can to 
a large degree be predicted. To eliminate the outlier cases, 
we examined how the 99.9th percentile of the clicks varied 
as the measure of influence increased. 

Number of followers. The most readily available and 
often used by the Twitterers measure of influence is the num- 
ber of followers a user has. As the Figure |3(a)| shows, the 
number of followers of an average poster of a given URL is a 
relatively weak predictor of the maximum number of clicks 
that the URL can receive. 

Number of retweets. When users post URLs their 
posts might be retweeted by other users. Each retweet ex- 
plicitly credits the original poster of the URL (or the user 
from whom the retweeting user heard about the URL). The 
number of times a user has been credited in a retweet has 
been assumed to be a good measure of influence W\. How- 
ever, Figure [3 (b) | shows that the number of times a user has 
been retweeted in the past is a poor predictor of the max- 
imum number of clicks the URLs posted by that user can 
get. 

PageRank. Figure |3(c)| shows that the average PageR- 
ank of those who tweet a certain URL does not correlate 
well with the number of clicks the URL will get. One of the 
main differences between the IP algorithm and PageRank is 
that the IP algorithm takes into account the passivity of the 
people a user influences and PageRank does not. IP-influence 
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(c) Average user PageRank vs. number of clicks on 
URLs 



(d) Average user H-index vs. number of clicks on 
URLs 
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(e) Average user IP-influence vs. number of clicks on (f) Average user IP-influence vs. number of clicks on 
URLs, using the retweet graph as input URLs, using the co-mention graph as input 



Figure 3: We consider several user attributes: the number of followers, the number of times a user has been 
retweeted, the user's PageRank, H-index and IP-influence. For each of the 3.2M Bit.ly URLs we compute 
the average value of a user's attribute among all the users that mentioned that URL. This value becomes the 
x coordinate of the URL-point; the y coordinate is the number of clicks on the Bit.ly URL. The density of 
the URL-points is then plotted for each of the four user attributes. The solid line in each figure represents 
the 99.9th, percentile of Bit.ly clicks at a given attribute value. The dotted line is the linear regression fit for 
the solid line. 
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Figure 4: For each user we place a user-point with 
IP-influence as the y coordinate and the x coordinate 
set to the number of user's followers. The density 
of user-points is represented in grayscale. 
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Figure 5: The correlation between the IP-influence 
values computed based on two inputs: the co- 
mention influence graph and the retweet influence 
graph. 



is a much better indicator of URL popularity than PageR- 
ank. This suggests that influencing users who are difficult to 
influence, as opposed to simply influencing many users, has 
a positive impact on the eventual popularity of the message 
that a user tweets. 

The Hirsch Index. Figure 3(d) shows that despite the 



fact that in the scientific community the H-index is used as a 
good predictor of high achievements, in Twitter it does not 
correlate well with URL popularity. This may reflect the fact 
that attention in the scientific community plays a symmetric 
role, since those who pay attention to the work of others 
also seek it from the same community. Thus, citations play 
a strategic role in the successful publishing of papers, since 
the expectation of authors is that referees and authors will 
demand attention to their work and those of their colleagues. 
Within Social Media such symmetry does not exist and thus 
the decision to forward a message to the network lacks this 
particularly strategic value. 

IP-Influence score. As we can see in Figure |3(c)l the 
average IP-influence of those who tweeted a certain URL can 
determine the maximum number of clicks that a URL will 
get. Since the URL clicks are never considered by the IP al- 
gorithm to compute the user's influence, the fact that we find 
a very clear connection between average IP-influence and the 
eventual popularity of the URLs (measured by clicks) serves 
as an unbiased evaluation of the algorithm and exposes the 
power of IP-influence. For example, as we can see in Fig- 
given a group of users having very large average 



3(e) 



IP-influence scores who post a URL we can estimate, with 
99.9% certainty, that this URL will not receive more than 
100, 000 clicks. On the other hand, if a group of users with 
very low average IP-influence score post the same URL we 
can estimate, with 99.9% certainty that the URL will not 
receive more than 100 clicks. 

Furthermore, figure [4] shows that a user's IP-influence is 
not well correlated with the number of followers she has. 
This reveals interesting implications about the relationship 
between a person's popularity and the influence she has on 
other people. In particular, it shows that having many fol- 
lowers on Twitter does not imply power to influence them 
to even click on a URL. 



6. IP ALGORITHM ADAPTABILITY 

As mentioned earlier (SQ there are many ways of defining 
a social graph in which the edges indicate pairwise influence. 
We have so far been using the graph based on which user 
retweeted which user (retweet influence graph). However, 
that explicitly signals of influence such as retweets are not 
always available. One way of overcoming this obstacle is 
to use other, possibly weaker, signals of influence. In the 
case of Twitter, we can define an influence graph based on 
mentions of URLs without regard of actual retweeting in the 
following way. 

The co-mention graph. The nodes of the co-mention 
influence graph are users who tweeted at least three URLs. 
The edge (i,j) exists if user j follows user i and j mentioned 
at least one URL that i had previously mentioned. The edge 
e = (i,j) has weight w e = F ._^g. . where Fij is the number 
of URLs that i mentioned and j never did and S is the 
number of URLs mentioned by j and previously mentioned 
by i . 

The resulting graph has the disadvantage that the edges 
are based on a much less explicit notion of influence than 
when based on retweets. Therefore the graph could have 
edges between users who do not influence each other. On 
the other hand, the retweeting conventions on Twitter are 
not uniform and therefore sometimes users who repost a 
URL do not necessarily credit the correct source of the URL 
with a retweet [15] . Hence, the influence graph based on 
retweets has potentially missing edges. 

Since the IP algorithm has the flexibility of allowing any 
influence graph as input, we can compute the influence scores 
of the users based on the co-mention influence graph and 
compare with the results obtained from the retweet influence 
graph. As we can see in Figure 3(f) we find that the retweet 
graph yields influence scores that are better at predicting the 
maximum number of clicks a URL will obtain than the co- 
mention influence graph. Nevertheless, Figures |3(f)| |3(c)| 
|3(a)| and |3(b)] show that the influence values obtained from 
the co-mention influence graph are still better at predicting 
URL traffic than other measures such as PageRank, number 
of followers, H-index or the total number of times a user 
has been retweeted. Furthermore, Figure [5] shows that the 



mashable 


Social Media Blogger 




redscarebot 


Keyword Aggregator 


jokoanwar 


Film Director 




drunk_bot 


Suspended 


google 


Google News 




tea_robot 


Keyword Aggregator 


aplusk 


Actor 




condos 


Listing Aggregator 


syfy 


Science Fiction Channel 




wootboot 


Suspended 


smashingmag 


Online Developer Magazine 




raybeckerman 


Attorney 


michellemalkin 


Conservative Commentator 




hashphotography 


Keyword Aggregator 


theonion 


News Satire Organization 




charlieandsandy 


Suspended 


rww 


Tech/Social Media Blogger 




ms_defy 


Suspended 


breakingnews 


News Aggregator 




rpattinsonbot 


Keyword Aggregator 



Table 1: Users with the most IP-influence (with at Table 2: Users with the most IP-passivity 

least 10 URLs posted in the period) 



influence score based on both graphs do not correlate well, 
which suggests that considering explicit vs. implicit signals 
of influence can drastically change the outcome of the IP 
algorithm. In general, we find that the explicitness of the 
signal provided by the retweets yields slightly better results 
when it comes to predicting URL traffic, however, the influ- 
ence scores based on co-mentions may surface a different set 
of potentially influential users. 

7. CASE STUDIES 

As we mentioned earlier, one important application of the 
IP algorithm is ranking users by their relative influence. In 
this section, we present a series of rankings of Twitter users 
based on the influence, passivity, and number of followers. 

The most influential. Table 1 shows the users with the 
most IP-influence in the network. We constrain the number 
of URLs posted to 10 to obtain this list, which is dominated 
by news services from politics, technology, and Social Media. 
These users post many links which are forwarded by other 
users, causing their influence to be high. 

The most passive. Table 2 shows the users with the 
most IP-passivity in the network. Passive users are those 
who follow many people, but retweet a very small percent- 
age of the information they consume. Interestingly, robot 
accounts (which automatically aggregate keywords or spe- 
cific content from any user on the network), suspended ac- 
counts (which are likely to be spammers), and users who 
post extremely often are among the users with the most 
IP-passivity. Since robots "attend" to all existing tweets 
and only retweet certain ones, the percentage of informa- 
tion they forward from other users is actually very small. 
This explains why the IP-algorithm assigns them such high 
passivity scores. This also highlights a new application of 
the IP-algorithm: automatic identification of robot users in- 
cluding aggregators and spammers. 

The least influential with many followers. We have 
demonstrated that the amount of attention a person gets 
may not be a good indicator of the influence they have in 
spreading their message. In order to make this point more 
explicit, we show, in Table 3, some examples of users who 
are followed by many people but have relatively low influ- 
ence. These users are very popular and have the attention 
of millions of people but are not able to spread their mes- 
sage very far. In most cases, their messages are consumed 
by their followers but not considered important enough to 
forward to others. 

The most influential with few followers. We are 



also able identify users with very low number of followers 
but high influence. Table 4 shows the users with the most 
influence who rank less that 100, 000 in number of follow- 
ers. We find that during the data collection period some 
of the users in this category ran very successful retweeting 
contests where users who retweeted their URLs would have 
the chance of winning a prize. Moreover, there is a group 
of users who post from Twitdraw. com, a website where peo- 
ple can make drawings and post them on Twitter. Even 
though these users don't have many followers, their draw- 
ings are of very high quality and spread throughout Twitter 
reaching many people. Other interesting users such as lo- 
cal politicians and political cartoonists are also found in the 
list. The IP-influence measure allows us to find interesting 
content posted by users who would otherwise be buried by 
popularity rankings such as number of followers. 

8. DISCUSSION 

Influence as predictor of attention. As we demon- 
strated in !|5j the IP-influence of the users is an accurate 
predictor of the upper bound on the total number of clicks 
they can get on the URLs they post. The input to the influ- 
ence algorithm is a weighted graph, where the arc weights 
represent the influence of one user over another. This graph 
can be derived from the user activity in many ways, even 
in cases where explicit feedback in the form of retweets or 
"likes" is not available (§611 . 

Topic-based and group-based influence. The Influence- 
Passivity algorithm can be run on a subpgraph of the full 
graph or on the subset of the user activity data. For ex- 
ample, if only users tweeting about a certain topic are part 
of the graph, the IP-influence determines the most influen- 
tial users in that topic. It is an open question whether the 
IP algorithm would be equally accurate at different graph 
scales. 

Content ranking. The predictive power of IP-influence 
can be used for content filtering and ranking in order to 
reveal content that is most likely to receive attention based 
on which users mentioned that content early on. Similarly, 
as in the case of users, this can be computed on a per-topic 
or per-user-group basis. 

Content filtering. We have observed from our passivity 
experiments that highly passive users tend to be primarily 
robots or spammers. This leads to an interesting extension 
of this work to perform content filtering, limiting the tweets 
to influential users and thereby reducing spam in Twitter 
feeds. 

Influence dynamics. We have computed the influence 



User name 


Category 


Rank by # followers 


Rank by IP-influence 


thatkevinsmith 


Screen Writer 


33 


1000 


nprpolitics 


Political News 


41 


525 


eonline 


TV Channel 


42 


1008 


marthastewart 


Television Host 


43 


1169 


nba 


Sports 


64 


1041 


davidgregory 


Journalist 


106 


3630 


nfl 


Sports 


110 


2244 


cbsnews 


News Channel 


114 


2278 


jdickerson 


Journalist 


147 


4408 


newsweek 


News Magazine 


148 


756 



Table 3: Users with many followers and low relative influence 



User name 


Category 


Rank by ^ followers 


Rank by IP-influence 


cashcycle 


Retweet Contest 


153286 


13 


mobiliens 


Retweet Contest 


293455 


70 


jadermattos 


Twitdraw 


227934 


134 


_jaum_ 


Twitdraw 


404385 


143 


robmillerusmc 


Congressional Candidate 


147803 


145 


sitekulite 


Twitdraw 


423917 


149 


jesse_sublett 


Musician 


385265 


151 


cyberaurora 


Tech News Website 


446207 


163 


viveraxo 


Twitdraw 


458279 


165 


fireflower_ 


Political Cartoons 


452832 


195 



Table 4: Users with very few followers but high relative influence 



measures over a fixed 300-hour window. However, the So- 
cial Media are a rapidly changing, real-time communication 
platform. There are several implications of this. First, the 
IP algorithm would need to be modified to take into ac- 
count the tweet timestamps. Second, the IP-influence it- 
self changes over time, which brings a number of interesting 
questions about the dynamics of influence and attention. 
In particular, whether users with spikes of IP-influence are 
overall more influential than users which can sustain their 
IP-influence over time is an open question. 

9. CONCLUSION 

Given the mushrooming popularity of Social Media, vast 
efforts are devoted by individuals, governments and enter- 
prises to getting attention to their ideas, policies, products, 
and commentary through social networks. But the very 
large scale of the networks underlying Social Media makes 
it hard for any of these topics to get enough attention in 
order to rise to the most trending ones. Given this con- 
straint, there has been a natural shift on the part of the 
content generators towards targeting those individuals that 
are perceived as influential because of their large number 
of followers. This study shows that the correlation between 
popularity and influence is weaker than it might be expected. 
This is a reflection of the fact that for information to prop- 
agate in a network, individuals need to forward it to the 
other members, thus having to actively engage rather than 
passively read it and cease to act on it. Moreover, since our 
measure of influence is not specific to Twitter it is applicable 
to many other social networks. This opens the possibility 
of discovering influential individuals within a network which 
can on average have a further reach than others in the same 



medium, regardless of their popularity. 
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